Dataset statistics
| Number of variables | 12 |
|---|---|
| Number of observations | 3584 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 336.1 KiB |
| Average record size in memory | 96.0 B |
Variable types
| Numeric | 9 |
|---|---|
| Categorical | 3 |
updated_dates has a high cardinality: 512 distinct values | High cardinality |
df_index is highly correlated with latitude and 4 other fields | High correlation |
latitude is highly correlated with df_index and 2 other fields | High correlation |
longitude is highly correlated with latitude | High correlation |
confirmed is highly correlated with df_index and 3 other fields | High correlation |
deaths is highly correlated with df_index and 2 other fields | High correlation |
recovered is highly correlated with df_index and 3 other fields | High correlation |
active is highly correlated with df_index and 1 other fields | High correlation |
df_index is highly correlated with latitude and 5 other fields | High correlation |
year is highly correlated with incident | High correlation |
latitude is highly correlated with df_index and 2 other fields | High correlation |
longitude is highly correlated with df_index and 2 other fields | High correlation |
confirmed is highly correlated with df_index and 5 other fields | High correlation |
deaths is highly correlated with df_index and 3 other fields | High correlation |
recovered is highly correlated with df_index and 3 other fields | High correlation |
active is highly correlated with df_index and 3 other fields | High correlation |
incident is highly correlated with year | High correlation |
df_index is highly correlated with latitude and 4 other fields | High correlation |
latitude is highly correlated with df_index and 1 other fields | High correlation |
longitude is highly correlated with latitude | High correlation |
confirmed is highly correlated with df_index and 3 other fields | High correlation |
deaths is highly correlated with df_index and 3 other fields | High correlation |
recovered is highly correlated with df_index and 3 other fields | High correlation |
active is highly correlated with df_index and 3 other fields | High correlation |
active is highly correlated with recovered and 7 other fields | High correlation |
recovered is highly correlated with active and 9 other fields | High correlation |
df_index is highly correlated with active and 8 other fields | High correlation |
fatality is highly correlated with active and 9 other fields | High correlation |
confirmed is highly correlated with active and 8 other fields | High correlation |
longitude is highly correlated with active and 8 other fields | High correlation |
year is highly correlated with recovered and 2 other fields | High correlation |
latitude is highly correlated with active and 8 other fields | High correlation |
incident is highly correlated with recovered and 6 other fields | High correlation |
deaths is highly correlated with active and 8 other fields | High correlation |
Province is highly correlated with active and 8 other fields | High correlation |
df_index is uniformly distributed | Uniform |
Province is uniformly distributed | Uniform |
updated_dates is uniformly distributed | Uniform |
df_index has unique values | Unique |
Reproduction
| Analysis started | 2021-11-07 17:25:33.536969 |
|---|---|
| Analysis finished | 2021-11-07 17:26:00.670306 |
| Duration | 27.13 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
df_index
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONUNIFORMUNIQUE| Distinct | 3584 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1791.5 |
| Minimum | 0 |
|---|---|
| Maximum | 3583 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 28.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 179.15 |
| Q1 | 895.75 |
| median | 1791.5 |
| Q3 | 2687.25 |
| 95-th percentile | 3403.85 |
| Maximum | 3583 |
| Range | 3583 |
| Interquartile range (IQR) | 1791.5 |
Descriptive statistics
| Standard deviation | 1034.75601 |
|---|---|
| Coefficient of variation (CV) | 0.5775919676 |
| Kurtosis | -1.2 |
| Mean | 1791.5 |
| Median Absolute Deviation (MAD) | 896 |
| Skewness | 0 |
| Sum | 6420736 |
| Variance | 1070720 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 1 | 1 | < 0.1% |
| 2382 | 1 | < 0.1% |
| 2383 | 1 | < 0.1% |
| 2384 | 1 | < 0.1% |
| 2385 | 1 | < 0.1% |
| 2386 | 1 | < 0.1% |
| 2387 | 1 | < 0.1% |
| 2388 | 1 | < 0.1% |
| 2389 | 1 | < 0.1% |
| Other values (3574) | 3574 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 |
| Value | Count | Frequency (%) |
| 3583 | 1 | |
| 3582 | 1 | |
| 3581 | 1 | |
| 3580 | 1 | |
| 3579 | 1 | |
| 3578 | 1 | |
| 3577 | 1 | |
| 3576 | 1 | |
| 3575 | 1 | |
| 3574 | 1 |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 28.1 KiB |
| 2021 | |
|---|---|
| 2020 |
Length
| Max length | 4 |
|---|---|
| Median length | 4 |
| Mean length | 4 |
| Min length | 4 |
Characters and Unicode
| Total characters | 14336 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2021 |
|---|---|
| 2nd row | 2021 |
| 3rd row | 2021 |
| 4th row | 2021 |
| 5th row | 2021 |
Common Values
| Value | Count | Frequency (%) |
| 2021 | 2156 | |
| 2020 | 1428 |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 2021 | 2156 | |
| 2020 | 1428 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 7168 | |
| 0 | 5012 | |
| 1 | 2156 | 15.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 14336 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 7168 | |
| 0 | 5012 | |
| 1 | 2156 | 15.0% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 14336 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2 | 7168 | |
| 0 | 5012 | |
| 1 | 2156 | 15.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 14336 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2 | 7168 | |
| 0 | 5012 | |
| 1 | 2156 | 15.0% |
| Distinct | 7 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 28.1 KiB |
| Sindh | |
|---|---|
| Punjab | |
| Islamabad | |
| Khyber Pakhtunkhwa | |
| Balochistan | |
| Other values (2) |
Length
| Max length | 22 |
|---|---|
| Median length | 11 |
| Mean length | 12.42857143 |
| Min length | 5 |
Characters and Unicode
| Total characters | 44544 |
|---|---|
| Distinct characters | 31 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Sindh |
|---|---|
| 2nd row | Sindh |
| 3rd row | Sindh |
| 4th row | Sindh |
| 5th row | Sindh |
Common Values
| Value | Count | Frequency (%) |
| Sindh | 512 | |
| Punjab | 512 | |
| Islamabad | 512 | |
| Khyber Pakhtunkhwa | 512 | |
| Balochistan | 512 | |
| Azad Jammu and Kashmir | 512 | |
| Gilgit-Baltistan | 512 |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| sindh | 512 | |
| punjab | 512 | |
| islamabad | 512 | |
| khyber | 512 | |
| pakhtunkhwa | 512 | |
| balochistan | 512 | |
| azad | 512 | |
| jammu | 512 | |
| and | 512 | |
| kashmir | 512 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 7168 | |
| i | 3072 | 6.9% |
| n | 3072 | 6.9% |
| h | 3072 | 6.9% |
| t | 2560 | 5.7% |
| d | 2048 | 4.6% |
| s | 2048 | 4.6% |
| l | 2048 | 4.6% |
| m | 2048 | 4.6% |
| 2048 | 4.6% | |
| Other values (21) | 15360 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 36352 | |
| Uppercase Letter | 5632 | 12.6% |
| Space Separator | 2048 | 4.6% |
| Dash Punctuation | 512 | 1.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 7168 | |
| i | 3072 | 8.5% |
| n | 3072 | 8.5% |
| h | 3072 | 8.5% |
| t | 2560 | 7.0% |
| d | 2048 | 5.6% |
| s | 2048 | 5.6% |
| l | 2048 | 5.6% |
| m | 2048 | 5.6% |
| u | 1536 | 4.2% |
| Other values (11) | 7680 |
Uppercase Letter
| Value | Count | Frequency (%) |
| P | 1024 | |
| K | 1024 | |
| B | 1024 | |
| S | 512 | |
| I | 512 | |
| A | 512 | |
| J | 512 | |
| G | 512 |
Space Separator
| Value | Count | Frequency (%) |
| 2048 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 512 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 41984 | |
| Common | 2560 | 5.7% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 7168 | |
| i | 3072 | 7.3% |
| n | 3072 | 7.3% |
| h | 3072 | 7.3% |
| t | 2560 | 6.1% |
| d | 2048 | 4.9% |
| s | 2048 | 4.9% |
| l | 2048 | 4.9% |
| m | 2048 | 4.9% |
| u | 1536 | 3.7% |
| Other values (19) | 13312 |
Common
| Value | Count | Frequency (%) |
| 2048 | ||
| - | 512 | 20.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 44544 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 7168 | |
| i | 3072 | 6.9% |
| n | 3072 | 6.9% |
| h | 3072 | 6.9% |
| t | 2560 | 5.7% |
| d | 2048 | 4.6% |
| s | 2048 | 4.6% |
| l | 2048 | 4.6% |
| m | 2048 | 4.6% |
| 2048 | 4.6% | |
| Other values (21) | 15360 |
| Distinct | 512 |
|---|---|
| Distinct (%) | 14.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 28.1 KiB |
| 2021-01-02 | 7 |
|---|---|
| 2021-01-03 | 7 |
| 2020-07-23 | 7 |
| 2020-07-22 | 7 |
| 2020-07-21 | 7 |
| Other values (507) |
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
Characters and Unicode
| Total characters | 35840 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2021-01-02 |
|---|---|
| 2nd row | 2021-01-03 |
| 3rd row | 2021-01-04 |
| 4th row | 2021-01-05 |
| 5th row | 2021-01-06 |
Common Values
| Value | Count | Frequency (%) |
| 2021-01-02 | 7 | 0.2% |
| 2021-01-03 | 7 | 0.2% |
| 2020-07-23 | 7 | 0.2% |
| 2020-07-22 | 7 | 0.2% |
| 2020-07-21 | 7 | 0.2% |
| 2020-07-20 | 7 | 0.2% |
| 2020-07-19 | 7 | 0.2% |
| 2020-07-18 | 7 | 0.2% |
| 2020-07-17 | 7 | 0.2% |
| 2020-07-16 | 7 | 0.2% |
| Other values (502) | 3514 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 2021-01-02 | 7 | 0.2% |
| 2021-01-03 | 7 | 0.2% |
| 2020-07-23 | 7 | 0.2% |
| 2020-07-22 | 7 | 0.2% |
| 2020-07-21 | 7 | 0.2% |
| 2020-07-20 | 7 | 0.2% |
| 2020-07-19 | 7 | 0.2% |
| 2020-07-18 | 7 | 0.2% |
| 2020-07-17 | 7 | 0.2% |
| 2020-07-16 | 7 | 0.2% |
| Other values (502) | 3514 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 9520 | |
| 2 | 9121 | |
| - | 7168 | |
| 1 | 5110 | |
| 7 | 784 | 2.2% |
| 8 | 784 | 2.2% |
| 9 | 763 | 2.1% |
| 3 | 756 | 2.1% |
| 6 | 700 | 2.0% |
| 4 | 567 | 1.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 28672 | |
| Dash Punctuation | 7168 | 20.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 9520 | |
| 2 | 9121 | |
| 1 | 5110 | |
| 7 | 784 | 2.7% |
| 8 | 784 | 2.7% |
| 9 | 763 | 2.7% |
| 3 | 756 | 2.6% |
| 6 | 700 | 2.4% |
| 4 | 567 | 2.0% |
| 5 | 567 | 2.0% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 7168 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 35840 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 9520 | |
| 2 | 9121 | |
| - | 7168 | |
| 1 | 5110 | |
| 7 | 784 | 2.2% |
| 8 | 784 | 2.2% |
| 9 | 763 | 2.1% |
| 3 | 756 | 2.1% |
| 6 | 700 | 2.0% |
| 4 | 567 | 1.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 35840 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 9520 | |
| 2 | 9121 | |
| - | 7168 | |
| 1 | 5110 | |
| 7 | 784 | 2.2% |
| 8 | 784 | 2.2% |
| 9 | 763 | 2.1% |
| 3 | 756 | 2.1% |
| 6 | 700 | 2.0% |
| 4 | 567 | 1.6% |
| Distinct | 7 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 31.87417857 |
| Minimum | 26.009446 |
|---|---|
| Maximum | 35.792146 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 28.1 KiB |
Quantile statistics
| Minimum | 26.009446 |
|---|---|
| 5-th percentile | 26.009446 |
| Q1 | 28.328492 |
| median | 33.665087 |
| Q3 | 34.485332 |
| 95-th percentile | 35.792146 |
| Maximum | 35.792146 |
| Range | 9.7827 |
| Interquartile range (IQR) | 6.15684 |
Descriptive statistics
| Standard deviation | 3.340887503 |
|---|---|
| Coefficient of variation (CV) | 0.1048148581 |
| Kurtosis | -1.099548455 |
| Mean | 31.87417857 |
| Median Absolute Deviation (MAD) | 2.127059 |
| Skewness | -0.5897898054 |
| Sum | 114237.056 |
| Variance | 11.16152931 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 26.009446 | 512 | |
| 30.811346 | 512 | |
| 33.665087 | 512 | |
| 34.485332 | 512 | |
| 28.328492 | 512 | |
| 34.027401 | 512 | |
| 35.792146 | 512 |
| Value | Count | Frequency (%) |
| 26.009446 | 512 | |
| 28.328492 | 512 | |
| 30.811346 | 512 | |
| 33.665087 | 512 | |
| 34.027401 | 512 | |
| 34.485332 | 512 | |
| 35.792146 | 512 |
| Value | Count | Frequency (%) |
| 35.792146 | 512 | |
| 34.485332 | 512 | |
| 34.027401 | 512 | |
| 33.665087 | 512 | |
| 30.811346 | 512 | |
| 28.328492 | 512 | |
| 26.009446 | 512 |
| Distinct | 8 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 71.56523457 |
| Minimum | 65.898403 |
|---|---|
| Maximum | 74.982138 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 28.1 KiB |
Quantile statistics
| Minimum | 65.898403 |
|---|---|
| 5-th percentile | 65.898403 |
| Q1 | 68.776807 |
| median | 72.139132 |
| Q3 | 73.947253 |
| 95-th percentile | 74.982138 |
| Maximum | 74.982138 |
| Range | 9.083735 |
| Interquartile range (IQR) | 5.170446 |
Descriptive statistics
| Standard deviation | 2.934571521 |
|---|---|
| Coefficient of variation (CV) | 0.04100554604 |
| Kurtosis | -0.5585674741 |
| Mean | 71.56523457 |
| Median Absolute Deviation (MAD) | 1.808121 |
| Skewness | -0.8268508823 |
| Sum | 256489.8007 |
| Variance | 8.611710011 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=8)
| Value | Count | Frequency (%) |
| 68.776807 | 512 | |
| 73.121219 | 512 | |
| 72.09169 | 512 | |
| 65.898403 | 512 | |
| 73.947253 | 512 | |
| 74.982138 | 512 | |
| 72.139132 | 498 | |
| 72.139132 | 14 | 0.4% |
| Value | Count | Frequency (%) |
| 65.898403 | 512 | |
| 68.776807 | 512 | |
| 72.09169 | 512 | |
| 72.139132 | 498 | |
| 72.139132 | 14 | 0.4% |
| 73.121219 | 512 | |
| 73.947253 | 512 | |
| 74.982138 | 512 |
| Value | Count | Frequency (%) |
| 74.982138 | 512 | |
| 73.947253 | 512 | |
| 73.121219 | 512 | |
| 72.139132 | 14 | 0.4% |
| 72.139132 | 498 | |
| 72.09169 | 512 | |
| 68.776807 | 512 | |
| 65.898403 | 512 |
| Distinct | 3396 |
|---|---|
| Distinct (%) | 94.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 94213.90765 |
| Minimum | 444 |
|---|---|
| Maximum | 470978 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 28.1 KiB |
Quantile statistics
| Minimum | 444 |
|---|---|
| 5-th percentile | 2613 |
| Q1 | 12516.25 |
| median | 35209 |
| Q3 | 131801.25 |
| 95-th percentile | 372601.8 |
| Maximum | 470978 |
| Range | 470534 |
| Interquartile range (IQR) | 119285 |
Descriptive statistics
| Standard deviation | 118279.6815 |
|---|---|
| Coefficient of variation (CV) | 1.255437594 |
| Kurtosis | 1.670154145 |
| Mean | 94213.90765 |
| Median Absolute Deviation (MAD) | 30797 |
| Skewness | 1.613536579 |
| Sum | 337662645 |
| Variance | 1.399008306 × 1010 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 4882 | 6 | 0.2% |
| 4959 | 6 | 0.2% |
| 4956 | 6 | 0.2% |
| 10390 | 4 | 0.1% |
| 4951 | 4 | 0.1% |
| 33220 | 3 | 0.1% |
| 34253 | 3 | 0.1% |
| 2105 | 3 | 0.1% |
| 2816 | 3 | 0.1% |
| 4909 | 3 | 0.1% |
| Other values (3386) | 3543 |
| Value | Count | Frequency (%) |
| 444 | 1 | |
| 534 | 2 | |
| 574 | 1 | |
| 647 | 1 | |
| 663 | 1 | |
| 703 | 1 | |
| 740 | 1 | |
| 769 | 1 | |
| 803 | 1 | |
| 813 | 1 |
| Value | Count | Frequency (%) |
| 470978 | 1 | |
| 470690 | 1 | |
| 470421 | 1 | |
| 470175 | 1 | |
| 469960 | 1 | |
| 469475 | 1 | |
| 469122 | 1 | |
| 468776 | 1 | |
| 468401 | 1 | |
| 468164 | 1 |
| Distinct | 1970 |
|---|---|
| Distinct (%) | 55.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2070.7726 |
| Minimum | 9 |
|---|---|
| Maximum | 12936 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 28.1 KiB |
Quantile statistics
| Minimum | 9 |
|---|---|
| 5-th percentile | 66 |
| Q1 | 175 |
| median | 681.5 |
| Q3 | 2829.75 |
| 95-th percentile | 9019.85 |
| Maximum | 12936 |
| Range | 12927 |
| Interquartile range (IQR) | 2654.75 |
Descriptive statistics
| Standard deviation | 2894.014597 |
|---|---|
| Coefficient of variation (CV) | 1.397553066 |
| Kurtosis | 3.39571609 |
| Mean | 2070.7726 |
| Median Absolute Deviation (MAD) | 579.5 |
| Skewness | 1.935487177 |
| Sum | 7421649 |
| Variance | 8375320.489 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 102 | 50 | 1.4% |
| 107 | 43 | 1.2% |
| 103 | 40 | 1.1% |
| 186 | 36 | 1.0% |
| 101 | 27 | 0.8% |
| 175 | 24 | 0.7% |
| 145 | 24 | 0.7% |
| 740 | 22 | 0.6% |
| 111 | 21 | 0.6% |
| 182 | 18 | 0.5% |
| Other values (1960) | 3279 |
| Value | Count | Frequency (%) |
| 9 | 1 | < 0.1% |
| 10 | 2 | |
| 11 | 1 | < 0.1% |
| 13 | 3 | |
| 14 | 1 | < 0.1% |
| 15 | 4 | |
| 16 | 2 | |
| 17 | 3 | |
| 18 | 2 | |
| 19 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 12936 | 1 | |
| 12929 | 1 | |
| 12924 | 1 | |
| 12918 | 1 | |
| 12915 | 2 | |
| 12909 | 1 | |
| 12904 | 1 | |
| 12902 | 1 | |
| 12898 | 1 | |
| 12896 | 1 |
| Distinct | 2711 |
|---|---|
| Distinct (%) | 75.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 69822.32576 |
| Minimum | 217 |
|---|---|
| Maximum | 339379 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 28.1 KiB |
Quantile statistics
| Minimum | 217 |
|---|---|
| 5-th percentile | 2243.4 |
| Q1 | 12377 |
| median | 47167.5 |
| Q3 | 81523.5 |
| 95-th percentile | 267407.15 |
| Maximum | 339379 |
| Range | 339162 |
| Interquartile range (IQR) | 69146.5 |
Descriptive statistics
| Standard deviation | 79608.88173 |
|---|---|
| Coefficient of variation (CV) | 1.140163707 |
| Kurtosis | 2.531833401 |
| Mean | 69822.32576 |
| Median Absolute Deviation (MAD) | 34695.5 |
| Skewness | 1.743747971 |
| Sum | 250243215.5 |
| Variance | 6337574050 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 69822.32576 | 637 | 17.8% |
| 4856 | 13 | 0.4% |
| 11861 | 6 | 0.2% |
| 2002 | 5 | 0.1% |
| 2212 | 5 | 0.1% |
| 4867 | 4 | 0.1% |
| 4787 | 4 | 0.1% |
| 4853 | 4 | 0.1% |
| 4795 | 4 | 0.1% |
| 4805 | 4 | 0.1% |
| Other values (2701) | 2898 |
| Value | Count | Frequency (%) |
| 217 | 1 | |
| 237 | 2 | |
| 242 | 1 | |
| 254 | 1 | |
| 264 | 1 | |
| 290 | 1 | |
| 302 | 1 | |
| 312 | 1 | |
| 317 | 1 | |
| 336 | 1 |
| Value | Count | Frequency (%) |
| 339379 | 1 | |
| 334305 | 1 | |
| 333882 | 1 | |
| 333650 | 1 | |
| 333529 | 1 | |
| 333201 | 1 | |
| 333198 | 1 | |
| 332830 | 1 | |
| 332777 | 1 | |
| 332429 | 1 |
| Distinct | 2237 |
|---|---|
| Distinct (%) | 62.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5935.375976 |
| Minimum | 0 |
|---|---|
| Maximum | 49980 |
| Zeros | 2 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 28.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 105.15 |
| Q1 | 664.75 |
| median | 3363.5 |
| Q3 | 5935.375976 |
| 95-th percentile | 23130.1 |
| Maximum | 49980 |
| Range | 49980 |
| Interquartile range (IQR) | 5270.625976 |
Descriptive statistics
| Standard deviation | 8359.487916 |
|---|---|
| Coefficient of variation (CV) | 1.408417588 |
| Kurtosis | 8.329876326 |
| Mean | 5935.375976 |
| Median Absolute Deviation (MAD) | 2588 |
| Skewness | 2.718163508 |
| Sum | 21272387.5 |
| Variance | 69881038.22 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 5935.375976 | 637 | 17.8% |
| 342 | 9 | 0.3% |
| 333 | 8 | 0.2% |
| 114 | 8 | 0.2% |
| 359 | 7 | 0.2% |
| 19 | 6 | 0.2% |
| 295 | 6 | 0.2% |
| 118 | 6 | 0.2% |
| 108 | 6 | 0.2% |
| 380 | 6 | 0.2% |
| Other values (2227) | 2885 |
| Value | Count | Frequency (%) |
| 0 | 2 | |
| 1 | 2 | |
| 2 | 2 | |
| 3 | 1 | < 0.1% |
| 4 | 4 | |
| 5 | 2 | |
| 6 | 2 | |
| 8 | 2 | |
| 9 | 2 | |
| 11 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 49980 | 1 | |
| 49884 | 1 | |
| 49349 | 1 | |
| 49327 | 1 | |
| 49241 | 1 | |
| 48423 | 1 | |
| 48085 | 1 | |
| 48003 | 1 | |
| 47690 | 1 | |
| 47632 | 1 |
| Distinct | 3399 |
|---|---|
| Distinct (%) | 94.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 669.9504665 |
| Minimum | 10.97552113 |
|---|---|
| Maximum | 5333.573876 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 28.1 KiB |
Quantile statistics
| Minimum | 10.97552113 |
|---|---|
| 5-th percentile | 76.07609602 |
| Q1 | 152.2592254 |
| median | 314.0531141 |
| Q3 | 608.4977425 |
| 95-th percentile | 3883.762955 |
| Maximum | 5333.573876 |
| Range | 5322.598355 |
| Interquartile range (IQR) | 456.2385171 |
Descriptive statistics
| Standard deviation | 1046.790887 |
|---|---|
| Coefficient of variation (CV) | 1.562489974 |
| Kurtosis | 8.881420128 |
| Mean | 669.9504665 |
| Median Absolute Deviation (MAD) | 192.5196925 |
| Skewness | 3.055448302 |
| Sum | 2401102.472 |
| Variance | 1095771.162 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 481.657169 | 6 | 0.2% |
| 489.253974 | 6 | 0.2% |
| 488.9579946 | 6 | 0.2% |
| 1025.075376 | 4 | 0.1% |
| 488.4646956 | 4 | 0.1% |
| 487.3794377 | 3 | 0.1% |
| 484.3209838 | 3 | 0.1% |
| 313.4181859 | 2 | 0.1% |
| 3885.930831 | 2 | 0.1% |
| 108.5673772 | 2 | 0.1% |
| Other values (3389) | 3546 |
| Value | Count | Frequency (%) |
| 10.97552113 | 1 | |
| 13.20028892 | 2 | |
| 14.18907461 | 1 | |
| 15.99360849 | 1 | |
| 16.38912276 | 1 | |
| 17.37790845 | 1 | |
| 18.29253521 | 1 | |
| 19.00940484 | 1 | |
| 19.84987267 | 1 | |
| 20.09706909 | 1 |
| Value | Count | Frequency (%) |
| 5333.573876 | 1 | |
| 5331.979117 | 1 | |
| 5329.736486 | 1 | |
| 5328.540416 | 1 | |
| 5327.29451 | 1 | |
| 5326.198113 | 1 | |
| 5324.2545 | 1 | |
| 5323.158102 | 1 | |
| 5321.363998 | 1 | |
| 5319.968583 | 1 |
| Distinct | 3397 |
|---|---|
| Distinct (%) | 94.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.026813803 |
| Minimum | 0.8620606061 |
|---|---|
| Maximum | 4.19907758 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 28.1 KiB |
Quantile statistics
| Minimum | 0.8620606061 |
|---|---|
| 5-th percentile | 0.9360269847 |
| Q1 | 1.126190772 |
| median | 2.029292606 |
| Q3 | 2.804357597 |
| 95-th percentile | 3.198653441 |
| Maximum | 4.19907758 |
| Range | 3.337016974 |
| Interquartile range (IQR) | 1.678166824 |
Descriptive statistics
| Standard deviation | 0.798536003 |
|---|---|
| Coefficient of variation (CV) | 0.3939858718 |
| Kurtosis | -1.260088614 |
| Mean | 2.026813803 |
| Median Absolute Deviation (MAD) | 0.8167014447 |
| Skewness | 0.1134852481 |
| Sum | 7264.100668 |
| Variance | 0.6376597481 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2.05811138 | 6 | 0.2% |
| 2.068824252 | 6 | 0.2% |
| 2.060189861 | 4 | 0.1% |
| 1.790182868 | 4 | 0.1% |
| 2.077816256 | 3 | 0.1% |
| 2.747252747 | 3 | 0.1% |
| 2.064777328 | 3 | 0.1% |
| 2.056866304 | 3 | 0.1% |
| 1.122407483 | 2 | 0.1% |
| 0.9079949727 | 2 | 0.1% |
| Other values (3387) | 3548 |
| Value | Count | Frequency (%) |
| 0.8620606061 | 1 | |
| 0.8623936494 | 1 | |
| 0.8629001157 | 1 | |
| 0.8631186174 | 1 | |
| 0.8635628745 | 1 | |
| 0.8636139833 | 1 | |
| 0.864256341 | 1 | |
| 0.8643791964 | 1 | |
| 0.8650635567 | 1 | |
| 0.8652942192 | 1 |
| Value | Count | Frequency (%) |
| 4.19907758 | 1 | |
| 4.003293849 | 2 | |
| 3.91105696 | 1 | |
| 3.849487585 | 1 | |
| 3.830145674 | 1 | |
| 3.827414465 | 1 | |
| 3.825822997 | 1 | |
| 3.795093795 | 1 | |
| 3.76795374 | 1 | |
| 3.747293621 | 1 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | year | Province | updated_dates | latitude | longitude | confirmed | deaths | recovered | active | incident | fatality | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 2021 | Sindh | 2021-01-02 | 26.009446 | 68.776807 | 216632.0 | 3582.0 | 196134.0 | 16916.0 | 452.390614 | 1.653495 |
| 1 | 1 | 2021 | Sindh | 2021-01-03 | 26.009446 | 68.776807 | 217636.0 | 3594.0 | 196677.0 | 17365.0 | 454.487258 | 1.651381 |
| 2 | 2 | 2021 | Sindh | 2021-01-04 | 26.009446 | 68.776807 | 218597.0 | 3611.0 | 197430.0 | 17556.0 | 456.494105 | 1.651898 |
| 3 | 3 | 2021 | Sindh | 2021-01-05 | 26.009446 | 68.776807 | 219452.0 | 3623.0 | 197870.0 | 17959.0 | 458.279594 | 1.650930 |
| 4 | 4 | 2021 | Sindh | 2021-01-06 | 26.009446 | 68.776807 | 220501.0 | 3634.0 | 198577.0 | 18290.0 | 460.470211 | 1.648065 |
| 5 | 5 | 2021 | Sindh | 2021-01-07 | 26.009446 | 68.776807 | 221734.0 | 3653.0 | 199649.0 | 18432.0 | 463.045073 | 1.647469 |
| 6 | 6 | 2021 | Sindh | 2021-01-08 | 26.009446 | 68.776807 | 222999.0 | 3670.0 | 202034.0 | 17295.0 | 465.686761 | 1.645747 |
| 7 | 7 | 2021 | Sindh | 2021-01-09 | 26.009446 | 68.776807 | 224004.0 | 3679.0 | 202570.0 | 17755.0 | 467.785494 | 1.642381 |
| 8 | 8 | 2021 | Sindh | 2021-01-10 | 26.009446 | 68.776807 | 225509.0 | 3693.0 | 203328.0 | 18488.0 | 470.928371 | 1.637629 |
| 9 | 9 | 2021 | Sindh | 2021-01-11 | 26.009446 | 68.776807 | 226338.0 | 3699.0 | 204075.0 | 18564.0 | 472.659564 | 1.634281 |
Last rows
| df_index | year | Province | updated_dates | latitude | longitude | confirmed | deaths | recovered | active | incident | fatality | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3574 | 3574 | 2020 | Gilgit-Baltistan | 2020-12-22 | 35.792146 | 74.982138 | 4831.0 | 99.0 | 4636.0 | 96.0 | 476.625519 | 2.049265 |
| 3575 | 3575 | 2020 | Gilgit-Baltistan | 2020-12-23 | 35.792146 | 74.982138 | 4832.0 | 99.0 | 4639.0 | 94.0 | 476.724179 | 2.048841 |
| 3576 | 3576 | 2020 | Gilgit-Baltistan | 2020-12-24 | 35.792146 | 74.982138 | 4838.0 | 99.0 | 4649.0 | 90.0 | 477.316138 | 2.046300 |
| 3577 | 3577 | 2020 | Gilgit-Baltistan | 2020-12-25 | 35.792146 | 74.982138 | 4844.0 | 101.0 | 4671.0 | 72.0 | 477.908096 | 2.085054 |
| 3578 | 3578 | 2020 | Gilgit-Baltistan | 2020-12-26 | 35.792146 | 74.982138 | 4847.0 | 101.0 | 4677.0 | 69.0 | 478.204076 | 2.083763 |
| 3579 | 3579 | 2020 | Gilgit-Baltistan | 2020-12-27 | 35.792146 | 74.982138 | 4850.0 | 101.0 | 4683.0 | 66.0 | 478.500055 | 2.082474 |
| 3580 | 3580 | 2020 | Gilgit-Baltistan | 2020-12-28 | 35.792146 | 74.982138 | 4850.0 | 101.0 | 4686.0 | 63.0 | 478.500055 | 2.082474 |
| 3581 | 3581 | 2020 | Gilgit-Baltistan | 2020-12-29 | 35.792146 | 74.982138 | 4853.0 | 101.0 | 4696.0 | 56.0 | 478.796035 | 2.081187 |
| 3582 | 3582 | 2020 | Gilgit-Baltistan | 2020-12-30 | 35.792146 | 74.982138 | 4855.0 | 101.0 | 4700.0 | 54.0 | 478.993354 | 2.080330 |
| 3583 | 3583 | 2020 | Gilgit-Baltistan | 2020-12-31 | 35.792146 | 74.982138 | 4856.0 | 101.0 | 4705.0 | 50.0 | 479.092014 | 2.079901 |